Martyna Stasiak id.156071
To perform the tasks, it is necessary to import the libraries used in the script and download the data on which we will be working.
In this script we will be using:
# importing of libraires that will be use in this script
import cv2
import matplotlib.pyplot as plt
import numpy as np
import PIL
# download and unpack images
!wget -O lena_std.tif http://www.lenna.org/lena_std.tif
!wget -O bug.zip http://grail.cs.washington.edu/projects/photomontage/data/bug.zip && unzip -o bug.zip
--2024-10-22 20:41:10-- http://www.lenna.org/lena_std.tif Resolving www.lenna.org (www.lenna.org)... 107.180.37.106 Connecting to www.lenna.org (www.lenna.org)|107.180.37.106|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 786572 (768K) [image/tiff] Saving to: ‘lena_std.tif’ lena_std.tif 100%[===================>] 768.14K 4.44MB/s in 0.2s 2024-10-22 20:41:10 (4.44 MB/s) - ‘lena_std.tif’ saved [786572/786572] --2024-10-22 20:41:11-- http://grail.cs.washington.edu/projects/photomontage/data/bug.zip Resolving grail.cs.washington.edu (grail.cs.washington.edu)... 128.208.5.93, 2607:4000:200:14::5d Connecting to grail.cs.washington.edu (grail.cs.washington.edu)|128.208.5.93|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://grail.cs.washington.edu/projects/photomontage/data/bug.zip [following] --2024-10-22 20:41:11-- https://grail.cs.washington.edu/projects/photomontage/data/bug.zip Connecting to grail.cs.washington.edu (grail.cs.washington.edu)|128.208.5.93|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 29430116 (28M) [application/zip] Saving to: ‘bug.zip’ bug.zip 100%[===================>] 28.07M 20.2MB/s in 1.4s 2024-10-22 20:41:12 (20.2 MB/s) - ‘bug.zip’ saved [29430116/29430116] Archive: bug.zip inflating: bug/b_bigbug0000_croppped.png inflating: bug/b_bigbug0001_croppped.png inflating: bug/b_bigbug0002_croppped.png inflating: bug/b_bigbug0003_croppped.png inflating: bug/b_bigbug0004_croppped.png inflating: bug/b_bigbug0005_croppped.png inflating: bug/b_bigbug0006_croppped.png inflating: bug/b_bigbug0007_croppped.png inflating: bug/b_bigbug0008_croppped.png inflating: bug/b_bigbug0009_croppped.png inflating: bug/b_bigbug0010_croppped.png inflating: bug/b_bigbug0011_croppped.png inflating: bug/b_bigbug0012_croppped.png inflating: bug/result.png
The colab platform requires a special way to display images with opencv. If the notebook is run in collab, execute the following code:
if "google.colab" in str(get_ipython()):
from google.colab.patches import cv2_imshow
imshow = cv2_imshow
else:
def imshow(img):
cv2.imshow('ImageWindow', img)
cv2.waitKey()
cv2.destroyAllWindows()
A function that compares 2 images, checking if the values are the same with minor calculation errors and if the element types are uint8.
def all_close(image_1: np.array, image_2: np.array):
return (
np.allclose(image_1, image_2, 0, 2)
and image_1.dtype == np.uint8
and image_2.dtype == np.uint8
)
Digital image storage is the representation of color in a certain field that to some extent reflects how humans perceives light. The most intuitive is spatial domain, in which the image consists of pixels arranged in the form of a 2D matrix. Each pixel has its own intensity value. This value can be represented in many ways, for example:
Different color spaces provide different image processing options. For example, from the HSV space we can determine the brightness directly, while using Grayscale it may be easier to detect the contours of objects in the scene.
Besides the spatial domain, the image can also be processed in the frequency domain. A 2D matrix image can be treated as a 2-dimensional signal, so it is subject to all operations on signals such as Fourier transform. By representing an image in the frequency domain, we have the possibility of easier detection of edges, blurred areas and image filtering.
RGB color space:
![]()
CYMK color space:
![]()
The most popular libraries in Python for image processing are:
We can see some differences in image processing by these libraries. OpenCV works by default on images in the BGR format, while Pillow in the RGB format. BGR is nothing but the inverted color order for each pixel (Blue, Green, Red).
# loading the image using the opencv library
img = cv2.imread(
"./lena_std.tif", 1
) # flag: 1 - color (BGR), 0 - grayscale, -1 unchanged it is used to load alpha channel
img = cv2.resize(img, (256, 256)) # change a size to 256x256
print("Shape:", img.shape)
print("BGR:", img[0, 0])
print("Image load and displayed by OpenCV\n")
imshow(img)
Shape: (256, 256, 3) BGR: [125 137 226] Image load and displayed by OpenCV
# By default opencv process images in BGR order of color
# keep this in mind when using other libraries like pillow
# (both libraries read data into the numpy table, so it is possible to exchange functionality between libraries)
img_pil = np.array(PIL.Image.open("./lena_std.tif"))
img_pil = cv2.resize(img_pil, (256, 256))
print("Shape:", img_pil.shape)
print("RGB:", img_pil[0, 0])
print("Image load by Pillow and displayed by OpenCV\n")
imshow(img_pil)
Shape: (256, 256, 3) RGB: [226 137 125] Image load by Pillow and displayed by OpenCV
# We can transform the image from RGB to BGR, then it will display properly
imshow(cv2.cvtColor(img_pil, cv2.COLOR_RGB2BGR))
# It is also possible to display an image using the matplotlib library (in RGB color order)
plt.imshow(img_pil)
<matplotlib.image.AxesImage at 0x7d7408de3700>
We can move freely between color spaces during image processing. Moreover, most libraries have mechanisms of conversions between the most popular spaces implemented by default.
Assuming input data R, G, B, where $ R, G, B \in [0,1] $ (if they are in the range $[0,255]$ then they should be divided by $255.0$). The conversion from RGB to HSV space can be represented as:
$C_{max} = max(R,G,B)$
$C_{min} = min(R,G,B)$
$\Delta = C_{max} - C_{min}$
${ H=\left\{ \begin{array}{ll} 0{\hspace{0.5cm}\text{for}\hspace{0.5cm}} \Delta = 0\\ 60 * (\frac{G - B}{\Delta} \mod 6){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = R\\ 60 * (\frac{B - R}{\Delta} + 2){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = G\\ 60 * (\frac{R - G}{\Delta} + 4){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = B \end{array} \right.}$
The conversion from RGB to Grayscale space can be represented as:
$$Gray = 0.2989 * R + 0.5870 * G + 0.1140 * B$$OpenCV includes a ready function cvtColor, which takes the image to be processed as the first parameter, and a constant specifying the type of conversion as the second parameter (constants marked by variables, eg COLOR_RGB2BGR).
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
img_luv = cv2.cvtColor(img, cv2.COLOR_BGR2LUV)
img_grayscale = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imshow(np.concatenate([img_rgb, img_hsv, img_luv], 1))
imshow(img_grayscale)
def BGR2RGB(img_bgr):
img_rgb = img_bgr[..., ::-1]
return img_rgb.astype(np.uint8)
def BGR2HSV(img_bgr):
img_bgr = img_bgr.astype(np.float32) / 255.0
B,G,R = cv2.split(img_bgr)
cmax = np.max(img_bgr, axis = -1)
cmin = np.min(img_bgr, axis = -1)
delta = cmax - cmin
V=cmax
S = np.where(delta == 0, 0, delta/cmax)
H = np.where(delta == 0, 0, np.where(cmax == R, 60*(((G-B)/delta) % 6), np.where(cmax == G, 60*(((B-R)/delta) +2), 60*(((R-G)/delta) +4))))
img_hsv = np.stack([H/2, S*255, V*255], axis = -1)
return img_hsv.astype(np.uint8)
def BGR2Gray(img_bgr):
B,G,R = cv2.split(img_bgr)
img_grayscale = 0.2989*R + 0.5870*G + 0.1140*B
return img_grayscale.astype(np.uint8)
img = cv2.imread("./lena_std.tif", 1)
img = cv2.resize(img, (256, 256))
img_rgb_2 = BGR2RGB(img)
img_hsv_2 = BGR2HSV(img)
img_grayscale_2 = BGR2Gray(img)
imshow(np.concatenate([img_rgb_2, img_hsv_2], 1))
imshow(img_grayscale_2)
print("\n===\n")
print("BGR2RGB Check:", (img_rgb == img_rgb_2).all())
print("BGR2HSV Check:", all_close(img_hsv, img_hsv_2))
print("BGR2Grayscale Check:", all_close(img_grayscale, img_grayscale_2))
print("HSV range Check: ", img_hsv_2.min(0).min(0), img_hsv_2.max(0).max(0))
print("HSV range Check: ", img_hsv.min(0).min(0), img_hsv.max(0).max(0)) #for my hsv
=== BGR2RGB Check: True BGR2HSV Check: False BGR2Grayscale Check: True HSV range Check: [ 0 18 61] [179 241 255] HSV range Check: [ 0 19 61] [179 242 255]
Implement the following conversions:
The results will be compared with the results of the functions included in OpenCV.
Note: There may be slight differences in pixel values between the same transforms, which may be due to numerical errors or simply a different type of rounding / truncating the values when transforming a floating point number to an integer.
Note 2: Don't use loops that iterate over each pixel and np.apply_along_axis because it's even slower than loops.
Note 3: At the end of the transformation, add type casting to uint8 e.g.img_bgr.astype(np.uint8)
Color spaces such as RGB and HSV are standard spaces that result from the nature of light. However, this does not limit you from creating your own color spaces. The pseudo-coloring methods are presented below - that is, giving the pixels a color based on an artificially prepared color space.
The Hot space may seem to be a particularly interesting space, as it gives a warmer color (yellow) for pixels of greater intensity (grayscale).
img_grayscale = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_hot = cv2.applyColorMap(img_grayscale, cv2.COLORMAP_HOT)
img_bone = cv2.applyColorMap(img_grayscale, cv2.COLORMAP_BONE)
img_ocean = cv2.applyColorMap(img_grayscale, cv2.COLORMAP_OCEAN)
imshow(np.concatenate([img, img_hot, img_bone, img_ocean], 1))
To apply our own color space, we can use the functions in the OpenCV LUT (Lookup Table). In the example below, 3 color spaces are prepared (lut_1, lut_2, lut_3). Each of these tables is a mapping function between each of the 256 values (uint8) and the new value. This function can also be applied to multi-channel images.
lut_1 = np.array(range(256))
lut_2 = np.array([255] * 100 + [0] * 100 + [255] * 56)
lut_3 = 64 * (np.array(range(256)) // 64)
img_lut_1 = cv2.LUT(img_grayscale, lut_1)
img_lut_2 = cv2.LUT(img_grayscale, lut_2)
img_lut_3 = cv2.LUT(img_grayscale, lut_3)
imshow(np.concatenate([img_grayscale, img_lut_1, img_lut_2, img_lut_3], 1))
For a Lenna image in Grayscale space, first transform it to a space containing 8 colors (buckets), and then convert the image to a Hot space.
Display intermediate results.
Note: You can use cv2.applyColorMap.
#lut_3 = 64 * (np.array(range(256)) // 64)
lut_4 = 32 * (np.array(range(256)) // 32).astype(np.uint8) # by this it will work like lut_3 above, but will create 8 buckets, 32 values each
img_lut_4 = cv2.LUT(img_grayscale, lut_4)
img_lut_4_HOT = cv2.applyColorMap(img_lut_4, cv2.COLORMAP_HOT) #applying HOT space into image of lena devided into 8 bins so less 'posterized'
imshow(np.concatenate([img_grayscale, img_lut_4], 1))
imshow(img_lut_4_HOT)
# changing the diamentions to display all of the Lenas next to each other
img_grayscale_color = cv2.cvtColor(img_grayscale, cv2.COLOR_GRAY2BGR)
img_lut_4_color = cv2.cvtColor(img_lut_4, cv2.COLOR_GRAY2BGR)
imshow(np.concatenate([img_grayscale_color, img_lut_4_color, img_lut_4_HOT], axis=1))
A point operation is a transformation that transforms an image into another image, for which the result of a particular pixel depends only on the corresponding pixel in the input image. Formally, any image operation can be written as follows:
$$F : I_{in} \rightarrow I_{out}$$where:
with a constraint:
$$I_{out}(x,y) = F(I_{in}(x,y))$$meaning that an output image pixel is the result of the F function on the corresponding input image pixel.
Examples of point operations are (for the simplicity, we assume that $i \in [0, 1]$):
def plot_simple(ax):
ax.set_title("Basic transformations")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(i, identity(i), label="identity")
ax.plot(i, invert(i), label="inversion")
ax.grid()
ax.legend()
def plot_gamma(ax):
ax.set_title("Gamma correction for diffrent gamma values")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(i, gamma(i, 0.1), label="0.1")
ax.plot(i, gamma(i, 0.2), label="0.2")
ax.plot(i, gamma(i, 0.5), label="0.5")
ax.plot(i, gamma(i, 1.0), label="1.0")
ax.plot(i, gamma(i, 1.8), label="1.8")
ax.plot(i, gamma(i, 3.0), label="4.0")
ax.plot(i, gamma(i, 4.5), label="4.5")
ax.grid()
ax.legend()
def plot_l_threshold(ax):
ax.set_title("Low-pass filtering")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(i, l_threshold(i, 0.1), label="0.1")
ax.plot(i, l_threshold(i, 0.5), label="0.5")
ax.plot(i, l_threshold(i, 0.9), label="0.9")
ax.grid()
ax.legend()
def plot_h_threshold(ax):
ax.set_title("High-pass filtering")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(i, h_threshold(i, 0.1), label="0.1")
ax.plot(i, h_threshold(i, 0.5), label="0.5")
ax.plot(i, h_threshold(i, 0.9), label="0.9")
ax.grid()
ax.legend()
def plot_quad(ax):
ax.set_title("Quadratic function")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(i, quad(i, 4.0), label="4.0")
ax.plot(i, quad(i, 2.0), label="2.0")
ax.plot(i, quad(i, 1.0), label="1.0")
ax.grid()
ax.legend()
def plot_stacked(ax):
ax.set_title("Transformation stacked")
ax.set_xlabel("Input intensity")
ax.set_ylabel("Output intensity")
ax.plot(
i, h_threshold(gamma(invert(i), 0.3), 0.7), label="threshold(gamma(invert))"
)
ax.plot(i, quad(l_threshold(i, 0.4), 3.0), label="quad(threshold)")
ax.grid()
ax.legend()
Below are the implementations of identity, inversion, and gamma correction operations and their visualizations.
def identity(i):
return i
def invert(i):
return 1.0 - i
def gamma(i, g):
return i**g
i = np.arange(0.0, 1.0, 0.01) # image domain
fig, ax = plt.subplots(1, 2, figsize=(15, 5), sharex="all", sharey="all")
axes = plt.gca()
axes.set_xlim([0.0, 1.0])
axes.set_ylim([0.0, 1.0])
plot_simple(ax[0])
plot_gamma(ax[1])
plt.show()
Implement the following transformations:
# todo:
def l_threshold(i, threshold):
return np.where(i < threshold, threshold, i)
# todo:
def h_threshold(i, threshold):
return np.where(i > threshold, threshold, i)
# todo:
def quad(i, a):
return a * i * (1-i) # i * (1 - i) cretaes the parabola and a specifies its max point
fig, ax = plt.subplots(2, 2, figsize=(15, 15), sharex="all", sharey="all")
axes = plt.gca()
axes.set_xlim([0.0, 1.0])
axes.set_ylim([0.0, 1.0])
plot_l_threshold(ax[0, 0])
plot_h_threshold(ax[0, 1])
plot_quad(ax[1, 0])
plot_stacked(ax[1, 1])
plt.show()
Above, we have defined the basic image transformation functions. These operations can be directly applied to images, resulting in a new transformed image.
Below are the same functions as above applied to the image of Lenna.
def imshow_simple(img_bgr):
print("\n===")
print("Identity | inversion\n")
img_bgr = img_bgr / 255.0
imshow(np.concatenate([identity(img_bgr), invert(img_bgr)], 1) * 255.0)
def imshow_gamma(img_bgr):
print("\n===")
print("Gamma correction 0.1 | 0.5 | 2.0 | 4.0\n")
img_bgr = img_bgr / 255.0
imshow(
np.concatenate(
[
gamma(img_bgr, 0.1),
gamma(img_bgr, 0.5),
gamma(img_bgr, 2.0),
gamma(img_bgr, 4.0),
],
1,
)
* 255.0
)
def imshow_l_threshold(img_bgr):
print("\n===")
print("Low-pass filter 0.3 | 0.5 | 0.9\n")
img_bgr = img_bgr / 255.0
imshow(
np.concatenate(
[
l_threshold(img_bgr, 0.3),
l_threshold(img_bgr, 0.5),
l_threshold(img_bgr, 0.9),
],
1,
)
* 255.0
)
def imshow_h_threshold(img_bgr):
print("\n===")
print("High-pass filter 0.3 | 0.5 | 0.9\n")
img_bgr = img_bgr / 255.0
imshow(
np.concatenate(
[
h_threshold(img_bgr, 0.3),
h_threshold(img_bgr, 0.5),
h_threshold(img_bgr, 0.9),
],
1,
)
* 255.0
)
def imshow_quad(img_bgr):
print("\n===")
print("Quadratic function 4.0 | 2.0 | 1.0\n")
img_bgr = img_bgr / 255.0
imshow(
np.concatenate([quad(img_bgr, 4.0), quad(img_bgr, 2.0), quad(img_bgr, 1.0)], 1)
* 255.0
)
def imshow_stacked(img_bgr):
print("\n===")
print(
"Stack of transformation: h_thrershold(gamma(invert))) | quad(l_threshold())\n"
)
img_bgr = img_bgr / 255.0
imshow(
np.concatenate(
[
h_threshold(gamma(invert(img_bgr), 0.3), 0.7),
quad(l_threshold(img_bgr, 0.4), 3.0),
],
1,
)
* 255.0
)
imshow_simple(img)
imshow_gamma(img)
imshow_l_threshold(img)
imshow_h_threshold(img)
imshow_quad(img)
imshow_stacked(img)
=== Identity | inversion
=== Gamma correction 0.1 | 0.5 | 2.0 | 4.0
=== Low-pass filter 0.3 | 0.5 | 0.9
=== High-pass filter 0.3 | 0.5 | 0.9
=== Quadratic function 4.0 | 2.0 | 1.0
=== Stack of transformation: h_thrershold(gamma(invert))) | quad(l_threshold())
Image pixel intensity / frequency values are represented as numbers (they are integer or floating point). This implies the ability to perform certain operations on pairs (sets) of images, such as addition, subtraction, averaging, etc.
Below is presented a naive solution to the problem of merging images representing the same object with different sharpness.
files = [
"./bug/b_bigbug0000_croppped.png",
"./bug/b_bigbug0001_croppped.png",
"./bug/b_bigbug0002_croppped.png",
"./bug/b_bigbug0003_croppped.png",
"./bug/b_bigbug0004_croppped.png",
"./bug/b_bigbug0005_croppped.png",
"./bug/b_bigbug0006_croppped.png",
"./bug/b_bigbug0007_croppped.png",
"./bug/b_bigbug0008_croppped.png",
"./bug/b_bigbug0009_croppped.png",
"./bug/b_bigbug0010_croppped.png",
"./bug/b_bigbug0011_croppped.png",
"./bug/b_bigbug0012_croppped.png",
]
# load images
bugs = [cv2.imread(f, 1) for f in files]
bugs = list(map(lambda i: cv2.resize(i, None, fx=0.3, fy=0.3), bugs))
#
# loading the expected result of the merging
result = cv2.imread("./bug/result.png", 1)
result = cv2.resize(result, None, fx=0.3, fy=0.3)
We can perform the averaging operation on the loaded images. The expected result is an image with medium sharpness.
If the areas in which the sharpness of the image is high were specified, it would be possible to assemble the sharp image in every place (for this, convolution operations are needed, which will be introduced in the next classes).
#
bug = np.stack(bugs, 0).mean(0)
print("\n===")
print("Photos of ants with sharpness at different distances\n")
imshow(np.concatenate(bugs[0:4], 1))
imshow(np.concatenate(bugs[4:8], 1))
imshow(np.concatenate(bugs[8:12], 1))
print("\n===")
print("Averaging of the component images and the target image\n")
imshow(np.concatenate([bug, result], 1))
=== Photos of ants with sharpness at different distances
=== Averaging of the component images and the target image
The task is to practice arithmetic operations on images and simple feature detection based on single-point processing.
Using the Lenna image, find areas in the image for which grayscale values are in the range 120-160.
Then propose operations that will return the inverse of the Lenna image (RGB or BGR for cv2) for selected areas, and copy the pixels from the Lenna image (RGB or BGR for cv2) for the remaining areas.
img = cv2.imread("./lena_std.tif", 1)
img = cv2.resize(img, (256, 256))
img_grayscale = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #changing to grayscale (taken from some cell above)
pixel_for_inversion = (img_grayscale >=120) & (img_grayscale <= 160) #selecting pixels for inversion, ones given in task
img_inverted = 255-img #inverting
img_result = img.copy() #copy of original Lenna to mantain it for displaying
img_result[pixel_for_inversion] = img_inverted[pixel_for_inversion] #switching up given pixels for inverted ones
#showing just the result
print("Result after inversion for given pixels:")
imshow(img_result)
#creating imgs that show the selected pixels to better understand it and to display it
white_back = np.ones_like(img) *255 #just white background to better see the selected pixels of Lenna
white_back[pixel_for_inversion] = img[pixel_for_inversion] #putting selected pixels of Original Lena Image onto the white background
inverted_white_back = white_back.copy()
inverted_white_back[pixel_for_inversion] = img_inverted[pixel_for_inversion] #putting selected pixels that are inverted onto the white background to see clearly
img_grayscale_color = cv2.cvtColor(img_grayscale, cv2.COLOR_GRAY2BGR) #just to change the diamentions for displaying purposes
#displaying 'All the steps' just to visualize better and understand
print('\n---------------------------------------------------------------------------------------')
print("All Lena steps\n (1)Original Lena, (2)Grayscale, (3)Selected pixels for inversion, (4)Inverted selected pixels, (5)Result")
imshow(np.concatenate([img, img_grayscale_color, white_back, inverted_white_back ,img_result], 1))
Result after inversion for given pixels:
--------------------------------------------------------------------------------------- All Lena steps (1)Original Lena, (2)Grayscale, (3)Selected pixels for inversion, (4)Inverted selected pixels, (5)Result
In addition to operations modifying a single pixel, there are also those that transform the geometry of the entire image. The basic geometric transformations include:
The above operations are called affine operations and can be represented as:
$$ y = Tx +b$$where $x$ is the pixel position vector $(i,j)$ in the input image, $y$ is the pixel position vector $(i',j')$ in the output image, $b$ is the translation vector and is and $T$ is a matrix of transformation
In order for the affine transformation to take only one parameter $T$, it is necessary to extend the number of dimensions to 3, where the pixel vectors on the last dimension have the value 1.
Thus, the general form of an affine transform takes the form: $$ \begin{bmatrix} i'\\ j'\\ 1 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ 0 & 0 & 1 \end{bmatrix}* \begin{bmatrix} i\\ j\\ 1 \end{bmatrix} $$
Then, the basic operations can be defined as:
The above operations can be composed by matrix multiplication.
Note:
OpenCV contains an affine operation application operation, however, since the last row always has the same form in basic operations ($[0, 0, 1]$), it takes a transformation in the form:
The entire matrix is used in more advanced transformations, e.g. in a perspective transformation or homography: $$ T = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} $$
# translation
t1 = np.array([[1, 0, 50], [0, 1, -50], [0, 0, 1]], np.float32)
# rotation
t2 = np.array([[0.0, -1.0, 256], [1.0, 0.0, 0], [0, 0, 1]], np.float32)
# scaling
t3 = np.array([[0.5, 0, 0], [0, 0.5, 0], [0, 0, 1]], np.float32)
img_t1 = cv2.warpAffine(img, t1[:2], img.shape[:2])
img_t2 = cv2.warpAffine(img_t1, t2[:2], img_t1.shape[:2])
img_t3 = cv2.warpAffine(img_t2, t3[:2], img_t2.shape[:2])
imshow(np.concatenate([img, img_t1, img_t2, img_t3], 1))
The above example shows that the individual results of affine operations can be lossy. You can notice that after the first shift operation, part of the color image is out of the frame and is lost. This results in incorrect later processing (despite correct mathematical syntax).
The solution is to combine affine operations using matrix multiplication. Below is a single transform containing all of the above operations, while not losing information between operations.
T = t3 @ t2 @ t1
img_direct = cv2.warpAffine(img, T[:2], img.shape[:2])
imshow(img_direct)
# shear
t4 = np.array([[1, -0.2, 0], [0.1, 1, 0], [0, 0, 1]], np.float32)
img_t4 = cv2.warpAffine(img_direct, t4[:2], img_direct.shape[:2])
imshow(img_t4)
2D Transformations
| Transformation name | included operations | Preserves |
|---|---|---|
| translation | translation | straight lines, parallelism, angles, lengths, orientation |
| rigid (Euclidean) | translation, rotation | straight lines, parallelism, angles, lengths |
| similarity | translation, rotation, scaling | straight lines, parallelism, angles |
| affine | translation, rotation, scaling, affine | straight lines, parallelism |
| projective | translation, rotation, scaling, affine, projective | straight lines |
The calculation of the histogram consists in counting the number of pixels of a given value. In other words, the histogram shows how many pixels of a certain intensity there are in the image.
To calculate a histogram for an image, we can use a ready-made function contained in the matplotlib hist() library or numpy histogram.
h, _, _ = plt.hist(img_grayscale.flatten(), 256, histtype="step")
plt.show()
imshow(img_grayscale)
Some irregularities in the number of pixel intensity occurrences can be seen from the histogram of the image. When the histogram is unbalanced, and therefore certain intensity ranges dominate the image, we can use the histogram equalization method so that the transformed image has a more even number of all intensities.
Thanks to this operation, we can transform very dark images, in which no characteristic points are visible, in such a way that changes in pixel intensity highlight previously invisible changes in the image.
Histogram equalization is implemented in the OpenCV library as equalizeHist which takes an image as input and returns the image after transformation.
img_equalized = cv2.equalizeHist(img_grayscale)
plt.hist(img_equalized.flatten(), 255, histtype="step")
plt.show()
imshow(img_equalized)
To manually perform a histogram equalization, first calculate cumulative distribution function of the pixel intensity. The cumulative distribution function tells us what the probability is that when selecting any pixel in the image, its intensity will be less than a given intensity on the cumulative distribution function.
The next step is to normalize the obtained cumulative distribution (so that the values of the pledge domain are in the range [0, 255]). In this way, we got our own lookup table, introduced earlier in the class.
Having this table, we can use the LUT() operation to get the image after the histogram has been aligned.
cdf_h = np.cumsum(h)
plt.plot(cdf_h)
cdf_lut = (255 * cdf_h / np.max(cdf_h)).astype(np.uint8)
print(cdf_lut)
imshow(cv2.LUT(img_grayscale, cdf_lut))
[ 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 3 4 4 5 7 8 10 12 12 14 16 19 21 23 23 25 27 29 30 30 32 33 35 36 37 37 38 39 40 41 42 42 42 43 44 45 46 46 46 47 48 49 50 50 51 52 53 54 54 55 55 56 57 58 58 59 60 61 62 63 63 64 65 67 68 69 69 71 72 74 76 78 78 80 82 84 86 88 88 89 91 92 94 94 95 96 98 99 100 100 101 103 104 106 107 107 109 110 112 114 116 116 118 120 122 124 127 127 129 132 134 136 136 138 140 141 143 145 145 147 149 151 154 156 156 158 161 163 166 168 168 170 172 174 176 179 179 181 184 187 189 192 192 194 196 198 200 200 202 203 205 206 208 208 209 210 211 212 213 213 215 216 217 218 219 219 221 222 223 224 224 224 225 226 227 227 227 228 228 229 230 230 230 231 232 233 234 235 235 236 236 237 238 239 239 240 241 241 242 243 243 244 245 246 247 248 248 249 250 251 251 251 252 252 253 253 253 253 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 255]